Picture for Zhenwen Liang

Zhenwen Liang

Escape the Language Prior: Mitigating Late-Stage Modality Collapse in Audio Reasoning via Modality-Aware Policy Optimization

Add code
May 26, 2026
Viaarxiv icon

Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis

Add code
May 14, 2026
Viaarxiv icon

DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification

Add code
May 10, 2026
Viaarxiv icon

Too Correct to Learn: Reinforcement Learning on Saturated Reasoning Data

Add code
Apr 20, 2026
Viaarxiv icon

Deconstructing Multimodal Mathematical Reasoning: Towards a Unified Perception-Alignment-Reasoning Paradigm

Add code
Mar 09, 2026
Viaarxiv icon

Capability-Oriented Training Induced Alignment Risk

Add code
Feb 12, 2026
Viaarxiv icon

Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories

Add code
Feb 04, 2026
Viaarxiv icon

Verified Critical Step Optimization for LLM Agents

Add code
Feb 03, 2026
Viaarxiv icon

Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning

Add code
Jan 26, 2026
Viaarxiv icon

Stable and Efficient Single-Rollout RL for Multimodal Reasoning

Add code
Dec 20, 2025
Figure 1 for Stable and Efficient Single-Rollout RL for Multimodal Reasoning
Figure 2 for Stable and Efficient Single-Rollout RL for Multimodal Reasoning
Figure 3 for Stable and Efficient Single-Rollout RL for Multimodal Reasoning
Figure 4 for Stable and Efficient Single-Rollout RL for Multimodal Reasoning
Viaarxiv icon